perm filename VIS[0,BGB]1 blob sn#062479 filedate 1973-09-19 generic text, type C, neo UTF8
COMMENT āŠ—   VALID 00012 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	2.1	Introduction to Computer Vision Theory.
C00004 00003	2.2     A Preview of My Computer Vision Theory.
C00009 00004	
C00011 00005	2.3	Computer Vision and Artificial Intelligence.
C00016 00006		"The design,  implementation, and use  of the  robot hardware
C00019 00007	2.4	The Vision Loop.
C00021 00008	2.5	The Nature of Images.
C00023 00009	2.6	Locus Solving.
C00024 00010	2.7	Recognition.
C00025 00011	2.8	The Nature of Worlds.
C00027 00012	2.9	Related Vision Work.
C00030 ENDMK
CāŠ—;
2.1	Introduction to Computer Vision Theory.

	In this chapter, two levels of  theory are interleaved. There
is a general  theory, which is my interpretation of the overall state
of the  art of computer  vision; and  there is  a particular  theory,
which has inspired this work.

	The word "theory", as used here, means  simply a concise body
of   statements   presenting  a   systematic  view   of   a  subject;
Specifically, I wish to exclude  the connotations that the theory  is
a  mathematical theory  or a  natural theory;  and so  it must  be an
"artificial theory"  or in more conventional terms it is a philosophy
of computer  vision.   The formal statements  of the  theory will  be
made  in  {BOLDFACE} and  will  be either  definitions,   notions  or
assumptions.

	The validity of a natural  theory is tested by experiment;  a
mathematical  theory   is  judged  valid   by  its   consistency;  an
artificial   theory  is  validated  by   the  successful  design  and
production of the intended artifact.  However,  when there is a  need
to compare unvalidated theories; the  grounds for such comparison are
the  usual means of verbal discourse:  analogy,  anecdote,  scenerio,
philosophical arguments and plausible reasoning.

2.2     A Preview of My Computer Vision Theory.

	(vision and  AI). Given  a computer  with several  television
cameras,    two  mechanical arms  and  a  radio  controled cart;  the
overall problem is to write a  program that can see and that can  act
intelligently with  respect to  the physical world.   In  my opinion,
the  simplest  such  vision robot  programs  involve  vision,   world
modeling,  and goal seeking; secondary and minor roles  are played by
language,  logic, and problem solving.

	(the  vision  loop).   Computer  vision  is  the  inverse  of
computer graphics.  The problem of computer graphics is to  synthesis
images from three dimensional models; the  problem of computer vision
is  to analyze  images into  three dimensional  models.   The overall
major structure of a general  purpose computer vision system is  that
of  a "feedback"  loop  between 2-D  images and  a  3-D world  model.
Depending  on circumstances,  the vision loop  should be  able to run
almost  entirely   top-down   (verification  vision)   or   bottom-up
(revelation vision).  Verification vision is all that  is required in
a  well  know  and   consquently  predictible  environment;   whereas
revelation vision  is required  in a  brand new  or rapidly  changing
environment.

	(the  nature  of  images). There  are  three  basic kinds  of
information in  a 2D  visual  image: photometric,   geometric,    and
topological;  also  there  are  four  kinds  of  2D  images:  raster,
contour,    mosaic, and  feature.  The traditional  subject  of image
processing  involves  the study  and  development  of  programs  that
enhance,   transform  and compare  2D images.  Nearly all  such image
processing work can be subsumed into computer vision.

	(locus solving). The  crux of computer  vision however is  to
deduce information about the  world being viewed from images  of that
world.   I believe that the world  information most directly relevant
is the physical location,  extent and light scattering  properties of
solid opaque  objects; the location,   orientation and scales  of the
cameras  that takes the pictures; and the  location and nature of the
lights that illuminate the world. Accordingly,   three central themes
of  my  theory are  body  locus solving,    camera  solving, and  sun
solving. The macroscopic world  doesn't change very rapidly;  between
any two world states there is an  intermediate world state.  Parallax
is  the  principal  means  of  depth  perception.   Parallax  is  the
alchemist that converts 2D  images into 3D models. Revelation  vision
is  a process  of comparing  percieved images  taken in  sequence and
constructing a 3D model of the unanticipated objects.


	(recognition). Recognition involves comparing  perceived data
with  predicted data; such recognition  comparing can be  done on any
of the four types of 2D  images or the 3D models. Arcane  recognition
techniques  can  be  avoided  by improving  the  prediction  and  the
analysis so that matchs are nearly obvious.

	(the  nature of worlds).  The rules about the  world that can
be  assumed a  priori  by  a  programmer are  the  laws  of  physics;
programming  a simulation of  the mundane  physical world to  a given
approximation is difficult 

	The remainder of this chapter  is devoted to elaborating  and
defending this theory.
2.3	Computer Vision and Artificial Intelligence.
	
	At one  extreme, computer vision  may be discribed  as merely
the problem of  getting visual input hardware properly connected to a
computer; once the computer can "see" a raster of intensities  in its
memory,  the rest  of  the problem  is  artificial intelligence.  The
other extreme  is harder to depict because it requires figuring where
to draw the line between vision software and other software.

Notion:		The Top-Down and Bottom-Up in Computer Vision.
		The vision sensor hardware is the "bottom";
		visual software and intelligence is the "top".

	Normal  vision  should  not  be  an  Artificial  Intelligence
problem  in the  sense that  it  will not  involve searching  a large
space of possibilities or of solving abstract problems.

"The history of progress in the development  of systems for automatic
symbolic   integration  poses  an  interesting   question  about  the
definition of artificial intelligence. Few would argue  that Slagle's
SAINT  program was  a  product of  artificial intelligence  research.
Moses'  SIN program for symbolic integration  seldom needed to resort
to search,  and for  this reason some  people consider  it much  more
powerful (intelligent ?) than  SAINT. Now, Risch (1969) has developed
an  algorithm  for  integrating  many  types  of  expressions.  Risch
considers himself  a  mathematician, not  an artificial  intelligence
researcher.  In your opinion  should Risch's  algorithm be considered
part of the subject matter of artificial intelligence ? If  you would
exclude Risch  from artifial intelligence,  how would you  respond to
the  statement  that  every  artificial  intelligence  program  might
eventually  be dominated  by  a  (more intelligent?)  non  artificial
intelligence algorithm?  If you would  include Risch, would  you also
include the long-division algorithm?"

			- Nils J. Nilsson, problem 4-5;
			Problem-Solving Methods in Artificial Intelligence.

	(Intellectual  Entities).  The larger  context  of  a  vision
theory  depends  on  ones' opinion  about  the  nature of  counscious
intelligent animals, men and  robots. It is  my opinion that mind  is
to matter,   as computer  software is  to computer hardware.  That is
mind  is a program  that is  running in the  brain.  Well  now,  what
software can account  for counsciousness, the  inner private life  of
the  self  that  burns  in  our  heads  ? The  so  called  stream  of
counsciousness consists  of little  voice(s) talking,   fragments  of
music playing, and most  important there is the flow of  the here and
now.   Your here  and now is  the totality of  the particular sights,
sounds,  smells,  and so  on that are  being played in  your head  in
time with  the respective sensor  stimuli. So  it is my  opinion that
the  major computation being  performed by an  intellectual entity in
order  to  stay  counscious  of  its  external  world  is  a  reality
simulation.
	"The design,  implementation, and use  of the  robot hardware
presents  some   difficult,  and  often  expensive,  engineering  and
maintenance problems. If  one is to  work in  this area solving  such
problems  is   a  necessary  prelude   but,  more  often   than  not,
unrewarding  because the activity  does not address  the questions of
A.I. reseach  that motivate  the project. Why,  then, build  devices?
Why not simulate  them and their environment? In  fact, the SRI group
has done  good work  in simulating  a  version of  their robot  in  a
simplified environment. The  answer given is  as follows. It  is felt
by  the  SRI  group  that  the  most  unsatisfactory  part  of  their
simulation effort was  the simulation of  the environment. Yet,  they
say that  90% of  the effort  of the simulation  team went  into this
part  of  the  simulation. It  turned  out to  be  very  difficult to
reproduce in an internal representation for a  computer the necessary
richness of environment that  would give rise to interesting behavior
by the highly  adaptive robt.  It is easier  and cheaper  to build  a
hardware robot  to extract what  information it  needs from the  real
world  than to organize  and store a  useful model.  Crudely put, the
SRI group's argument  is that the most  economic and efficient  store
of information about the real world is the real world itself."

					- E. A. Fiegenbaum [ref. X].
2.4	The Vision Loop.

Assumption:	The overall structure of a general purpose computer 
		vision system is that of a "feedback" loop 
		between 2-D images and a 3-D world model.

Alternatives:	1. Computer vision is structured like a compiler.
		2. Computer vision is structured in terms of
		   discrimination functions.

Assumption:	Computer vision is both top down and bottom up.

Alternatives:	1. Computer vision is mostly top down.
		2. Computer vision is mostly bottom up.

Computer vision  is the inverse of computer  graphics. The problem of
computer graphics  is  to  synthesis images  from  three  dimensional
models; the  problem of  computer vision  is to  analyze images  into
three dimensional models.

	
Vision loop terminolgy...............................................

	1. PREDICT	2D ā†’ 3D		synthesis	Verification
	2. PERCEIVE	3D ā†’ 2D		analysis	Revelation
	3. COMPARE			recognition

Discription of nearly pure top down vision...........................

Discription of nearly pure bottom up vision..........................

2.5	The Nature of Images.

Assumption:	Computer vision based on digitized television images.

Alternatives:	1. Active 3-D imaging device.
		2. Non-light devices: sound, radar, neutrinoes, etc.

	Although, a super intellectual entities would have eyes that could see
the whole electromagnetic spectrum from gamma radiation to direct current
as well as "voices" that could broadcast on any and all frequency.

Notion:		An image contains three basic kinds of data:
		topological data, geometric data, and photometric data.

2.X	A Notion of Computer Vision.

Assumption:	I will use a real computer capible of
		taking real images of the real world.

Alternatives:	1. ...use a real computer and simulated images.
		2. ...think about using a computer.
		3. Study biological vision systems.
2.6	Locus Solving.
2.6.1	Camera Locus Solving.
2.6.2	Body Locus Solving.
		Silhouette Cone Intersection.
		Envelope bodies.
2.6.3	Sun Locus Solving.
		(compute it, look at it, shine and shadows).
2.7	Recognition.
2.8	The Nature of Worlds.

Assumption:	The world model should be a 3-D geometric model.

Alternatives:	1. Image memory and 2-D models.
		2. Procedual Knowledge.
		3. Semantic knowledge.
		4. Formal Logic models.

	(On Partial Knowledge).

Assumption:	Partial knowledge should be represented by approxination.
Alternatives:	1. Tree of possibilties.
		2. Multi valued logic.
		3. Probablities.

(Alternate world models).
(Reality Simulation).

"For the purpose of  presenting my argument I must  first explain the
basic  premise of  sorcery as don  Juan presented  it to me.  He said
that for a sorcerer, the world  of everyday life is not real, or  out
there, as we believe  it is. For a sorcerer, reality  or the world we
all  know, is  only a  discription. For  the sake of  validating this
premise don Juan  concentrated the best  of his efforts into  leading
me to  a genuine conviction that what  I held in mind as  the world a
hand was merely a  description of the world;  a description that  had
been pounded into me from the moment I was born."

			- Carlos Castenda. The Trip to Ixtana.
	
2.9	Related Vision Work.
		Stanford Hand/Eye
		SRI - hart & duda.
		MIT Guzman, Waltz